{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GPT as both Reranker and Reader"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below we will demonstrate how to build an open-domain QA pipeline, using the ChatGPT as both a reranker module and also a reader to generate our answer.\n",
    "\n",
    "We will use a `TfidfRetriever` retriever, a `DocumentLister` to modify our documents into a single document, and a `PromptNode` model to access ChatGPT (using the API provided by OpenAI (https://platform.openai.com/docs/api-reference/chat/create)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Document Store\n",
    "\n",
    "First, we have a list of 20 passages about former president Barack Obama, fetched from a Wikipedia Elastic Index. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from haystack.document_stores import InMemoryDocumentStore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "passages = [\n",
    "   {\n",
    "      \"title\":\"Barack Obama (disambiguation)\",\n",
    "      \"text\":\"Barack Obama (disambiguation) Barack Obama (born 1964) is an American attorney and politician who served as the 44th President of the United States from 2009 to 2017.  Barack Obama may also refer to :\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama\",\n",
    "      \"text\":\"Barack Obama Barack Hussein Obama II (; born August 4, 1961) is an American attorney and politician who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, he was the first African American to be elected to the presidency. He previously served as a U.S. senator from Illinois from 2005 to 2008 and an Illinois state senator from 1997 to 2004. Obama was born in Honolulu, Hawaii. After graduating from Columbia University in 1983, he worked as a community organizer in Chicago. In 1988, he enrolled in Harvard Law School,\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama Sr.\",\n",
    "      \"text\":\"Barack Obama Sr. Barack Hussein Obama Sr. (; 18 June 1936 – 24 November 1982) was a Kenyan senior governmental economist and the father of Barack Obama, the 44th President of the United States. He is a central figure of his son\\\\'s memoir, \\\"Dreams from My Father\\\" (1995). Obama married in 1954 and had two children with his first wife, Kezia. He was selected for a special program to attend college in the United States and studied at the University of Hawaii. There, Obama met Stanley Ann Dunham, whom he married in 1961, and with whom he had a son,\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama in comics\",\n",
    "      \"text\":\"the face. Barack Obama is the subject of graphic novel \\\"Barack Hussein Obama\\\" by Steven Weissman. In this, President Obama and his cast of characters (Secretary Clinton, VP Joe Biden, his family) experience life in a parallel universe. Barack Obama has also appeared in Archie Comics Veronica #199, and Archie #616 and #617. President Obama was in the Flashpoint Storyline of DC comics of 2011. He discusses the earth members of The Green Lantern Corp with Amanda Waller.\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Family of Barack Obama\",\n",
    "      \"text\":\"Family of Barack Obama The family of Barack Obama, the 44th President of the United States, and his wife Michelle Obama is made up of people of Kenyan (Luo), African-American, and Old Stock American (including originally English, Scots-Irish, Welsh, German, and Swiss) ancestry. Their immediate family was the First Family of the United States from 2009 to 2017. The Obamas are the first First Family of African-American descent. Michelle LaVaughn Robinson Obama (born January 17, 1964) is an American lawyer, university administrator, and writer who served as the First Lady of the United States from 2009 to 2017. She is\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Bibliography of Barack Obama\",\n",
    "      \"text\":\"Bibliography of Barack Obama This bibliography of Barack Obama is a list of written and published works, both books and films, about Barack Obama, 44th President of the United States.\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"President Barack Obama (painting)\",\n",
    "      \"text\":\"President Barack Obama (painting) President Barack Obama is a 2018 portrait of Barack Obama by the artist Kehinde Wiley for the National Portrait Gallery. In October 2017, it was announced that Wiley had been chosen by Barack Obama to paint an official portrait of the former president to appear in Smithsonian\\\\'s National Portrait Gallery \\\"America\\\\'s Presidents\\\" exhibition. The painting depicts Obama sitting in a chair seemingly floating among foliage. The foliage is described by the author as \\\"chrysanthemums (the official flower of Chicago), jasmine (symbolic of Hawaii where the president spent most of his childhood) and African blue lilies (alluding\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama presidential campaign\",\n",
    "      \"text\":\"Barack Obama presidential campaign Barack Obama, the 44th President of the United States, has successfully run for president twice: Barack Obama presidential campaign may refer to:\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama: The Story\",\n",
    "      \"text\":\"Barack Obama: The Story Barack Obama: The Story is a book written by David Maraniss on the life of United States President Barack Obama. The biography was published on June 19, 2012.\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama Day\",\n",
    "      \"text\":\"Barack Obama Day Barack Obama Day refers to two days of recognition in the United States in honor of Barack Obama, who served as the 44th President of the United States from 2009 to 2017. The State of Illinois celebrates the commemorative holiday every August 4, which is Obama's birthday, beginning in 2018. Obama was a member of the Illinois Senate from 1997 to 2004 and represented the state in the United States Senate from 2005 to 2008 before becoming president. Similar to other commemorative holidays, it is not a legal state holiday, meaning workplaces are not closed on the\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama \\\"Joker\\\" poster\",\n",
    "      \"text\":\"Barack Obama &quot;Joker&quot; poster The Barack Obama \\\"Joker\\\" poster is a digitally manipulated image of United States President Barack Obama, designed by Firas Alkhateeb in January 2009, that was adopted by some critics of the Obama administration and described as the most famous anti-Obama image. The image portrays Obama as comic book supervillain the Joker, based on the portrayal by Heath Ledger in \\\"The Dark Knight\\\" (2008). Alkhateeb has said the image was not intended to make a political statement. He uploaded the image to the photo-sharing website Flickr, from where it was downloaded by an unknown individual who added\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama in comics\",\n",
    "      \"text\":\"who proclaims that he is endorsing him for president. The issue sold out four print runs. A month later, the comic was followed up by \\\"Presidential Material: Barack Obama\\\" by Jeff Mariotte and in November 2008 with \\\"Obama: The Comic Book\\\" by Rod Espinosa. In November 2008, two things led to an explosion in popularity of the Obama comic book character. One of Obama\\\\'s advisers gave an interview to journalist Jon Swaine of \\\"The Daily Telegraph\\\" titled, \\\"Barack Obama: The 50 facts you might not know.\\\" In the interview, it emerged that Obama collects \\\"Spider-Man and Conan the Barbarian.\\\" Then\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Family of Barack Obama\",\n",
    "      \"text\":\"Akumu Obama. She is the sole surviving full sibling of Barack Obama Sr. Sarah Onyango Obama was the third wife of Obama's paternal grandfather. She is known for short as Sarah Obama; she is sometimes referred to as Sarah Ogwel, Sarah Hussein Obama, or Sarah Anyango Obama. She lives in Nyang'oma Kogelo village, 30 miles west of western Kenya's main town, Kisumu, on the edge of Lake Victoria. (She should not be confused with her stepdaughter of the same name, Sarah Obama, a daughter of Onyango's second wife Akumu.) Although she is not a blood relation, Barack Obama calls her\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama\",\n",
    "      \"text\":\"11th consecutive year, although Dwight D. Eisenhower was selected most admired in twelve non-consecutive years. Obama was born on August 4, 1961, at Kapiolani Medical Center for Women and Children in Honolulu, Hawaii. He is the only president who was born outside of the contiguous 48 states. He was born to a white mother and a black father. His mother, Ann Dunham (1942–1995), was born in Wichita, Kansas; she was mostly of English descent, with some German, Irish (3.13%), Scottish, Swiss, and Welsh ancestry. His father, Barack Obama Sr. (1936–1982), was a Luo Kenyan from Nyang'oma Kogelo. Obama's parents met\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama\",\n",
    "      \"text\":\"since Democratic President Jimmy Carter. By contrast, the federal prison population increased significantly under presidents Ronald Reagan, George H. W. Bush, Bill Clinton, and George W. Bush. Obama left office in January 2017 with a 60% approval rating. A 2017 C-SPAN \\\"Presidential Historians Survey\\\" ranked Obama as the 12th-best US president. The Barack Obama Presidential Center is Obama\\\\'s planned presidential library. It will be hosted by the University of Chicago and located in Jackson Park on the South Side of Chicago.\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama Plaza\",\n",
    "      \"text\":\"Barack Obama Plaza Barack Obama Plaza (Moneygall services), is an off line service area, at Junction 23 of the M7 on the outskirts of the village of Moneygall in Counties Tipperary, Ireland. The plaza was opened for business in June 2014 and is accessed using the existing junction 23 slip roads. It is named after US president Barack Obama, whose 3rd great grandfather lived nearby. The ancestor was reportedly Falmouth Kearney, who emigrated to the US in 1850. The service station cost 7 million euros to construct, and opened in 2014. The Plaza is owned and operated by Supermacs Ireland\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama Academy\",\n",
    "      \"text\":\"Barack Obama Academy Barack Obama Academy is a small alternative middle school in Oakland, California. It is part of the Oakland Unified School District. It became notable as the first middle school in the United States to be officially named or renamed after US President Barack Obama in March 2009. The middle school, which opened in 2007, was formerly known as the Alternative Learning Community. The name change was prompted by the school's students. As of 2011 it had 24 students, most of whom were low income African Americans.\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Inauguration of Barack Obama\",\n",
    "      \"text\":\"Inauguration of Barack Obama Inauguration of Barack Obama may refer to:\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Barack Obama Day\",\n",
    "      \"text\":\"Bill 55, which designated August 4 as Barack Obama Day but did not make it an official state holiday. The bill passed both houses of the Illinois General Assembly with no votes against, and was signed into law by Illinois Governor Bruce Rauner on August 4, 2017. The bill amended the State Commemorative Dates Act to include a new section: Barack Obama Day. August 4th of each year is designated as Barack Obama Day, to be observed throughout the State as a day set apart to honor the 44th President of the United States of America who began his career\"\n",
    "   },\n",
    "   {\n",
    "      \"title\":\"Protests against Barack Obama\",\n",
    "      \"text\":\"Protests against Barack Obama Protests against Barack Obama occurred throughout the United States during Barack Obama's 2008 presidential campaign and during Obama's presidency. During the 2008 presidential election, particularly in the lead up to November 4, election day, numerous incidents against Obama were documented.\"\n",
    "   }\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "document_store = InMemoryDocumentStore()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from haystack.schema import Document\n",
    "\n",
    "documents = []\n",
    "for i, passage in enumerate(passages):\n",
    "    documents.append(Document(content=passage[\"text\"], meta={\"title\": passage[\"title\"]}, id=i))\n",
    "\n",
    "document_store.write_documents(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize the pipeline components"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Initialize the components we are going to use in our pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from haystack.nodes import TfidfRetriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = TfidfRetriever(top_k=20, document_store=document_store)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a DocumentLister, which takes the retrieved documents, and combines them into a single document, giving the prefix of \"Paragraph $n$: \" for the $n^{th}$ paragraph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fastrag.prompters.document_shapers.document_lister import DocumentLister"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "document_lister = DocumentLister()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Initialize PromptNode:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from haystack.pipelines import Pipeline\n",
    "from haystack.nodes import PromptNode, PromptTemplate\n",
    "from haystack.schema import Document\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "api_key=\"YOUR API KEY\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We create a prompt that asks the model to choose the three best paragraphs, and use them to answer the question:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from haystack.nodes.prompt import AnswerParser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "lfqa_prompt = PromptTemplate(name=\"lfqa\",\n",
    "                             prompt_text=\"From the following paragraphs, choose the top three best paragraphs to answer the question: {query} Then use them to answer the question: {query} \\n\\n {join(documents)} Answer:\",\n",
    "                             output_parser=AnswerParser()) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, the PromptNode will use the template described above, and dynmically assign the {query} and {join(documents)} tags with the relevant information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[03/30/2023 15:13:42] {prompt_node.py:451} WARNING - PromptNode has been potentially initialized with a language model not fine-tuned on instruction following tasks. Many of the default prompts and PromptTemplates will likely not work as intended. Use custom prompts and PromptTemplates specific to the gpt-3.5-turbo model\n"
     ]
    }
   ],
   "source": [
    "prompter = PromptNode(\"gpt-3.5-turbo\", default_prompt_template=lfqa_prompt, api_key=api_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "from haystack import Pipeline\n",
    "\n",
    "p = Pipeline()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add the components in the right order"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "p.add_node(component=retriever, name=\"Retriever\", inputs=[\"Query\"])\n",
    "p.add_node(component=document_lister, name=\"Joiner\", inputs=[\"Retriever\"])\n",
    "p.add_node(component=prompter, name=\"Prompter\", inputs=[\"Joiner\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run a query through the pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[03/30/2023 15:13:55] {openai_utils.py:180} WARNING - 1 out of the 1 completions have been truncated before reaching a natural stopping point. Increase the max_tokens parameter to allow for longer completions.\n"
     ]
    }
   ],
   "source": [
    "res = p.run(\n",
    "    query=\"Who is Barack Obama?\",\n",
    "    params = {\n",
    "        \"Retriever\": {\n",
    "            \"top_k\": 5\n",
    "        }\n",
    "    },\n",
    "    debug=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Display the answer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The top three best paragraphs to answer the question \"Who is Barack Obama?\" are Paragraph 1, Paragraph 3, and Paragraph 4. \n",
      "\n",
      "Using these paragraphs, we can answer the question: Barack Obama is an American attorney and politician who served as the 44th President of the United States from 2009 to 2017. He is the subject of a biography written by David Maraniss titled Barack Obama: The Story. Obama has successfully run for president twice, as seen in\n"
     ]
    }
   ],
   "source": [
    "print(res['answers'][0].answer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And this is the final prompt used for the answer generation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "From the following paragraphs, choose the top three best paragraphs to answer the question: Who is Barack Obama? Then use them to answer the question: Who is Barack Obama? \n",
      "\n",
      " Paragraph 1: Barack Obama (disambiguation) Barack Obama (born 1964) is an American attorney and politician who served as the 44th President of the United States from 2009 to 2017.  Barack Obama may also refer to :\n",
      "\n",
      "Paragraph 2: Bibliography of Barack Obama This bibliography of Barack Obama is a list of written and published works, both books and films, about Barack Obama, 44th President of the United States.\n",
      "\n",
      "Paragraph 3: Barack Obama: The Story Barack Obama: The Story is a book written by David Maraniss on the life of United States President Barack Obama. The biography was published on June 19, 2012.\n",
      "\n",
      "Paragraph 4: Barack Obama presidential campaign Barack Obama, the 44th President of the United States, has successfully run for president twice: Barack Obama presidential campaign may refer to:\n",
      "\n",
      "Paragraph 5: Inauguration of Barack Obama Inauguration of Barack Obama may refer to:\n",
      "\n",
      " Answer:\n"
     ]
    }
   ],
   "source": [
    "print(res['_debug']['Prompter']['runtime']['prompts_used'][0])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "vscode": {
   "interpreter": {
    "hash": "01262dede09baa68616418263efd26d33bafc508f82c218954990f624836b45d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
